‘GPU-accelerated Multiphysics Simulation’
نویسنده
چکیده
In recent technology developments General Purpose computation on Graphics Processor Units (GPGPU) has been recognized a viable HPC technique. In this context, GPUacceleration is rooted in high-order Single Instruction Multiple Data (SIMD)/Single Instruction Multiple Thread (SIMT) vector-processing capability, combined with highspeed asynchronous I/O and sophisticated parallel cache memory architecture. In this presentation we examine the enParallel, Inc. (ePX) approach in leveraging this technology for accelerated multiphysics computation {1}{2}{3}. As is well understood, both complexity and size impact realizable multiphysics simulation performance. Multiphysics applications by definition incorporate diverse model components, each of which employs characteristic algorithmic kernels, (e.g. sparse/dense linear solvers, gradient optimizers, multidimensional FFT/IFFT, wavelet, random variate generators). This complexity is further increased by any requirement for structured communications across module boundaries, (e.g. dynamic boundary conditions, multi-grid (re)discretization, and management of disparate time-scales). Further, multiphysics applications tend toward large scale and long runtimes due to; (a) presence of multiple physical processes and (b) high-order discretization as result of persistent nonlinearity, chaotic dynamics, etc. It then follows acceleration is highly motivated, and any associated performance optimization schema must be sufficiently sophisticated so as to address all salient aspects of process resource mapping and scheduling, and datapath movement. For the GPU-accelerated cluster, this remains a particularly important consideration due to the fact GPU lends an additional degree of freedom to any choice of processing resource; multiphysics performance optimization then reduces to a goal of achieving highest possible effective parallelism across all available HPC resources, each of which is associated with a characteristic process model. In ePX applications, processing models are organized hierarchically so as to structurally minimize high-overhead interprocess communications; process optimization is then performed based upon an assumed scatter-gather principle recursively applied at distributed (cluster) and Symmetric Multi-Processor (SMP; multicore CPU) hierarchy levels. This approach supports flexible optimization across all physics modules. In particular, explicit pipelining of cluster, CPU, and GPU processes is implemented based upon asynchronous transaction calls at an associated Application Programming Interface (API). This generally improves effective parallelization beyond what might otherwise be possible. Further, a complete multiphysics application must be accelerated consistent with dictates of Amdahl’s Law. In this context, ePX is shown to exhibit a full-featured supercomputer processing model, highly optimized for GPU-accelerated clusters or workstations, and particularly well suited to multiphysics applications. The ePX framework is then presented as a generic and reusable multiphysics development solution featuring scatter-gather infrastructure as a software architectural component and obviating any need for specialized compilation technology or OS runtime support.
منابع مشابه
Multi-GPU Computing with Abaqus: Benchmarking and scaling for multiphysics applications in mechatronics
Mechatronic systems encountered in the power and automation industries exhibit very complex behavior for a variety of applications. These problems require solutions derived from diverse physical phenomena, and hence are considered to be multiphysics problems. One such problem includes computing the coupled electromechanical response of electroactive polymer actuators. Due to the complex nature ...
متن کاملPerformance modeling and analysis of heterogeneous lattice Boltzmann simulations on CPU-GPU clusters
Computational fluid dynamic simulations are in general very compute intensive. Only by parallel simulations on modern supercomputers the computational demands of complex simulation tasks can be satisfied. Facing these computational demands GPUs offer high performance, as they provide the high floating point performance and memory to processor chip bandwidth. To successfully utilize GPU clusters...
متن کاملGPU Accelerated Direct Kinetic Simulation Code for Collisionless Plasma Expansion
Collisionless plasma expansion is a fundamental physics problem in plasma science and has great impacts on engineering applications, such as fusion and electric propulsion. Though much more computationally expensive, the kinetic approaches are required for both electrons and ions in order to accurately solve the collisionless plasma problems. The gridbased direct kinetic simulation code with GP...
متن کاملParallel Implementation for Phase-Field Simulation of Flow Effect on Dendritic Growth with GPU Acceleration
A Sola-phase field model combined Sola algorithm with phase-field model is established. It is difficult to implement real-time simulation as the computational grids increase. Taking pure SCN for example, the solidification microstructure evolution process in the presence of flow has been accelerated on a GPU with CUDA programming. The GPU implementation of the Sola-phase field model is introduc...
متن کاملToward GPU-accelerated traffic simulation and its real-time challenge
Traffic simulation is a growing domain of computational physics. Many life and industrial applications would benefit from traffic simulation to establish reliable transportation systems. A core challenge of this science research, however, is its unbounded scale of computation. This paper explores an advantage of using the graphics processing unit (GPU) for this computational challenge. We study...
متن کامل